An R Package for Accommodating Islands and Disjoint Zones in Areal Spatial Modelling
useR! Conference 2024, Salzburg, July 9
Kevin Horan, Maynooth University
Motivation
Create neighbourhood structure for England and Wales parliamentary constituencies
Motivation
Create neighbourhood structure for England and Wales parliamentary constituencies
Motivation
Include Scotland
Motivation
Creating neighbourhood structure
spdep::poly2nb()
sfdep::st_contiguity()
- necessary for
- exploratory tasks such e.g. LISA, or
- modelling spatial effects e.g. ICAR
- inexperienced users seeking help on stackoverflow etc
- even for experienced users, often want to have a visual component to creating these structures
Solutions
- remove islands from the model
- manually seek them out and link them
- maybe take an entirely different approach based on domain knowledge
sfislands Proposition
Package which encourages a workflow
pre-functions
- for setting up neighbourhood structure
post-functions
- for visualising model predictions
Pre-functions
| st_bridges() |
create a neighbourhood contiguity structure, with a k-nearest neighbours condition for islands |
| st_quickmap_nb() |
check structure visually on map |
| st_check_islands() |
check and record the contiguities which have been assigned to islands |
| st_manual_join_nb() |
make manual changes by adding connections |
| st_manual_cut_nb() |
make manual changes by removing connections |
Simple scenario
1. st_bridges()
- output:
sf dataframe with additional nb column containing named list of contiguities, with islands joined to k nearest neighbours
- additional arguments available for k, nb column type, attachment to dataframe, exclusion of islands
1. st_bridges()
Default: outputs original sf dataframe with additional named list column nb
st_bridges(rectangles,
"name") |>
head()
Simple feature collection with 5 features and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 0 ymin: 0 xmax: 6 ymax: 4
CRS: NA
name nb geometry
1 Rect1 2, 3 POLYGON ((0 0, 0 2, 2 2, 2 ...
2 Rect2 1, 3, 4 POLYGON ((2 0, 2 2, 4 2, 4 ...
3 Rect3 1, 2, 5 POLYGON ((2 2, 2 4, 4 4, 4 ...
4 Rect4 2 POLYGON ((5 0, 5 1, 6 1, 6 ...
5 Rect5 3 POLYGON ((0.8 3, 0.8 4, 1.8...
1. st_bridges()
Alternatively, a matrix column
st_bridges(rectangles,
"name",
nb_structure = "matrix") |>
head()
Simple feature collection with 5 features and 2 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 0 ymin: 0 xmax: 6 ymax: 4
CRS: NA
name nb.1 nb.2 nb.3 nb.4 nb.5 geometry
1 Rect1 0 1 1 0 0 POLYGON ((0 0, 0 2, 2 2, 2 ...
2 Rect2 1 0 1 1 0 POLYGON ((2 0, 2 2, 4 2, 4 ...
3 Rect3 1 1 0 0 1 POLYGON ((2 2, 2 4, 4 4, 4 ...
4 Rect4 0 1 0 0 0 POLYGON ((5 0, 5 1, 6 1, 6 ...
5 Rect5 0 0 1 0 0 POLYGON ((0.8 3, 0.8 4, 1.8...
1. st_bridges()
Other variations (e.g. not in dataframe, remove islands)
st_bridges(
rectangles,
"name",
remove_islands = T,
nb_structure = "list",
add_to_dataframe = F) |>
head()
$Rect1
[1] 2 3
$Rect2
[1] 1 3
$Rect3
[1] 1 2
st_bridges(
rectangles,
"name",
remove_islands = T,
nb_structure = "matrix",
add_to_dataframe = F) |>
head()
[,1] [,2] [,3]
Rect1 0 1 1
Rect2 1 0 1
Rect3 1 1 0
2. st_quickmap_nb()
Visualise structure quickly
st_bridges(
rectangles,
"name",
link_islands_k = 1) |>
st_quickmap_nb()
st_bridges(
rectangles,
"name",
link_islands_k = 2) |>
st_quickmap_nb()
2. st_quickmap_nb()
Options: numeric nodes
st_bridges(rectangles,
"name",
link_islands_k = 1) |>
st_quickmap_nb(nodes = "numeric")
2. st_quickmap_nb()
Options: styling
st_bridges(rectangles, "name", link_islands_k = 1) |>
st_quickmap_nb(linkcol = "orange", bordercol = "darkblue",
pointcol = "pink", linksize = 0.5,
pointsize = 6, fillcol = "red",
bordersize = 1)
3. st_check_islands()
For transparency, shows summary of non-contiguous connections in a dataframe
st_bridges(
rectangles,
"name",
link_islands_k = 1) |>
st_check_islands()
| Rect4 |
4 |
2 |
Rect2 |
| Rect5 |
5 |
3 |
Rect3 |
st_bridges(
rectangles,
"name",
link_islands_k = 2) |>
st_check_islands()
| Rect4 |
4 |
2 |
Rect2 |
| Rect4 |
4 |
3 |
Rect3 |
| Rect5 |
5 |
1 |
Rect1 |
| Rect5 |
5 |
3 |
Rect3 |
4. st_manual_join_nb()
If we feel that 4 should also be connected to 3, this can be done manually.
st_bridges(
rectangles,
"name") |>
st_quickmap_nb(
nodes = "numeric")
st_bridges(
rectangles,
"name") |>
st_manual_join_nb(3,4) |>
st_quickmap_nb(
nodes = "numeric")
5. st_manual_cut_nb()
And perhaps there is a desert between rectangles 1 and 2 which justifies removing the connection. We will edit it this time using names.
st_bridges(
rectangles,
"name") |>
st_quickmap_nb(
nodes = "numeric")
st_bridges(rectangles, "name") |>
st_manual_join_nb(3,4) |>
st_manual_cut_nb("Rect1",
"Rect2") |>
st_quickmap_nb(
nodes = "numeric")
Model
This object, of class nb, can be used for brms, INLA, rstan etc.
- post-functions focus on
mgcv models
- flexible and versatile
- extraction of spatial predictions can be cumbersome
Post-functions
| st_augment() |
augment the original dataframe with model predictions |
| st_quickmap_preds() |
generate quick maps of these predictions |
st_augment() works with:
- random effects (which are called with
bs='re'), and
- ICAR components (
bs='mrf')
Creating an mgcv model
mgcv::gam(
# fixed intercept and effect for covariate
y ~ covariate +
# random intercept at level region
s(region, bs = "re") +
# random slopes at level region
s(region, covariate, bs = "re") +
# ICAR varying intercept at level sub-region
s(sub-region,
bs = 'mrf',
xt = list(nb = data$nb)) +
# ICAR varying slope for covariate at level sub-region
s(sub-region, by = covariate,
bs = 'mrf',
xt = list(nb = data$nb)),
data = data,
method = "REML")
Example 1: Indonesia earthquakes
Motivation:
- Indonesia: a place with many islands
- earthquakes: a phenomenon which does not respect land contiguity
Also:
![]()
Earthquakes in Indonesia of magnitude > 5.5, 1985-2023. Categorised by magnitude as medium, large or extra-large.
![]()
Earthquake incidence in Indonesia, 1985-2023, mag > 5.5: count per square kilometre by province.
![]()
Indonesia faults. Surrounded by a 10 kilometre buffer.
![]()
Indonesia fault concentration. Square kilometre of buffered fault per square kilometre of province area.
Step 1. Pre-functions
Simple default structure and map
st_bridges(provinces_df, "province",
link_islands_k = 2) |>
st_quickmap_nb(fillcol = "antiquewhite1", bordercol = "black",
bordersize = 0.5, linkcol = "darkblue",
linksize = 0.8, pointcol = "red",
pointsize = 2) +
theme(panel.background = element_rect(fill = "#ECF6F7",
colour = "black",
linewidth=1.5),
axis.text = element_blank()) +
geom_sf(data=nearby_countries_df, fill="gray50",
linewidth=0.5, colour="black")
Step 1. Pre-functions
Simple default structure and map
Step 1. Pre-functions
st_bridges(provinces_df, "province", link_islands_k = 2) |>
st_check_islands()
| Bali |
2 |
12 |
Jawa Timur |
| Bali |
2 |
21 |
Nusa Tenggara Barat |
| Bangka-Belitung |
3 |
9 |
Jambi |
| Bangka-Belitung |
3 |
31 |
Sumatera Selatan |
| Kepulauan Riau |
17 |
9 |
Jambi |
| Kepulauan Riau |
17 |
24 |
Riau |
| Maluku |
19 |
20 |
Maluku Utara |
| Maluku |
19 |
22 |
Nusa Tenggara Timur |
| Maluku Utara |
20 |
19 |
Maluku |
| Maluku Utara |
20 |
27 |
Sulawesi Tengah |
| Nusa Tenggara Barat |
21 |
2 |
Bali |
| Nusa Tenggara Barat |
21 |
22 |
Nusa Tenggara Timur |
| Nusa Tenggara Timur |
22 |
19 |
Maluku |
| Nusa Tenggara Timur |
22 |
21 |
Nusa Tenggara Barat |
Step 1. Pre-functions
numeric - don’t know names
concavehull - presence of multipolygons
st_bridges(provinces_df, "province",
link_islands_k = 2) |>
st_quickmap_nb(fillcol = "antiquewhite1", bordercol = "black",
bordersize = 0.5, linkcol = "tomato", linksize = 0.5,
numericcol = "black", numericsize = 6,
nodes = "numeric",
concavehull = TRUE, hullcol = "darkgreen",
hullsize = 0.4) +
theme(panel.background = element_rect(fill = "#ECF6F7",
colour = "black",
linewidth=1.5),
axis.text = element_blank())
Step 1. Pre-functions
numeric - don’t know names
concavehull - presence of multipolyons
Step 1. Pre-functions
Make manual cut/join alterations
st_bridges(provinces_df, "province", link_islands_k = 2) |>
st_manual_join_nb(3,13) |>
st_manual_join_nb(13,17) |>
st_manual_join_nb(14,25) |>
st_manual_join_nb(20,29) |>
st_manual_join_nb(19,23) |>
st_manual_join_nb(16,27) |>
st_manual_join_nb(22,23) |>
st_manual_join_nb(7,19) |>
st_manual_join_nb(7,20) |>
st_manual_join_nb(19,28) |>
st_manual_join_nb(4,18) |>
st_manual_join_nb(21,26) |>
st_manual_join_nb(22,28) |>
st_manual_cut_nb(19,22) |>
st_quickmap_nb()
Step 1. Pre-functions
Result of manual cut/join alterations
Step 2. Model
mod_pois_mrf <- gam(damaging_quakes_total ~
fault_concentration +
s(province,
bs='mrf', xt=list(nb=prep_data$nb), k=24) +
offset(log(area_province)),
data=prep_data,
method="REML",
family = "poisson")
Step 3. Post-functions
plot_mrf <- mod_pois_mrf |>
st_augment(prep_data) |>
st_quickmap_preds(scale_low = "darkgreen", scale_mid = "ivory",
scale_high = "darkred", scale_midpoint = 0)
Workflow summary
sf dataframe
| province |
| province_id |
| quake_density |
| damaging_quakes_total |
| damaging_quakes_density |
| area_fault_within |
| area_province |
| fault_concentration |
| geometry |
|> st_bridges()
| province |
| province_id |
| quake_density |
| damaging_quakes_total |
| damaging_quakes_density |
| area_fault_within |
| area_province |
| fault_concentration |
| nb |
| geometry |
|> st_augment()
| province |
| province_id |
| quake_density |
| damaging_quakes_total |
| damaging_quakes_density |
| area_fault_within |
| area_province |
| fault_concentration |
| nb |
| mrf.smooth.province |
| se.mrf.smooth.province |
| geometry |
Workflow summary
# 1. set up neighbourhood structure
prep_data <- st_bridges(provinces_df, "province")
# 2. check and edit if necessary using
# st_quickmap_preds(), st_manual_join_nb(), st_manual_cut_nb()
# 3. define model
mod <- gam(quake_mlxl_total ~ fault_concentration +
s(province, bs='mrf', xt=list(nb=prep_data$nb)) +
offset(log(area_province)),
data=prep_data, method="REML",family = "poisson")
# 4. augment tidy estimates
tidy_ests <- st_augment(mod, prep_data)
# 5. visualise them
st_quickmap_preds(tidy_ests)
Example 2: London
Motivation:
- no islands, but
- at smaller scale, e.g. city, geographical barriers such as rivers may be relevant
- river Thames flows through London
River and contiguities
st_bridges(london, "NAME") |>
st_quickmap_nb(bordercol="black", bordersize=0.5, linkcol="blue") +
geom_sf(data=thames, colour="darkblue", linewidth=1.5) +
theme(panel.background = element_rect(
fill="#F6F3E9", colour="black", linewidth=1.5))
Riverside units only
st_bridges(london[riverside,],"NAME") |>
st_quickmap_nb(linksize = 0.5, bordercol = "black",
bordersize=0.5,, linkcol="blue") +
geom_sf(data=thames, colour="darkblue", linewidth=1.5) +
annotation_scale(location="br") + coord_sf(datum=NA) +
theme(panel.background = element_rect(
fill="#F6F3E9", colour="black", linewidth=1.5))
River crossings
![]()
1 kilometre buffer around crossings shaded green.
Visualise: st_quickmap_nb(nodes = “numeric”)
![]()
Boroughs which are not within 1 kilometre of a crossing are shaded pink.
Edit: st_manual_cut_nb()
# manually cut the links where there is no crossing
st_bridges(london[riverside,], "NAME") |>
st_manual_cut_nb(18,17) |>
st_manual_cut_nb(19,17) |>
st_manual_cut_nb(19,20) |>
st_manual_cut_nb(20,21) |>
st_manual_cut_nb(21,22) |>
st_manual_cut_nb(22,48) |>
st_manual_cut_nb(47,48) |>
st_manual_cut_nb(45,46) |>
st_manual_cut_nb(45,47) |>
st_manual_cut_nb(39,65) |>
st_manual_cut_nb(1,2) |>
st_quickmap_nb(bordercol = "black", bordersize = 0.5) +
geom_sf(data=no_touch_buffer, fill = "pink", alpha = 0.3)
![]()
Riverside boroughs of Greater London. Contiguities across the river have been cut for the pink boroughs.
Example 3: Liverpool
Motivation:
- include spatial hierarchical structure
- combined with ICAR
- based on ONS data used in multilevel spatial modelling course at University of Liverpool
Nested administrative divisions
| OA |
output areas |
lowest level of geographical area for census statistics, usually containing 100 - 625 persons |
| LSOA |
lower layer super output areas |
usually 4 or 5 OAs |
| MSOA |
middle layer super output areas |
usually 4 or 5 LSOAs |
Unemployment and illness
OA level neighbours
st_bridges(liverpool, "oa_cd") |>
st_quickmap_nb(pointsize = 0.1, linksize = 0.1, fillcol = "gray90")
Pre-functions and model
prepliv <- st_bridges(liverpool,
"oa_cd")
liverpool_model <-
# fixed intercept and slope
gam(unemployment ~ lt_illness +
# MSOA level random intercept
s(msoa_cd, bs="re") +
# MSOA level random slope
s(msoa_cd, lt_illness, bs="re") +
# LSOA level random intercept
s(lsoa_cd, bs="re") +
# LSOA level random slope
s(lsoa_cd, lt_illness, bs="re") +
# OA level ICAR random coefficients
s(oa_cd, bs='mrf',
xt=list(nb=prepliv$nb),k=20),
data=prepliv, method="REML")
Post-functions
liverpool_plots <-
liverpool_model |>
st_augment(prepliv) |>
st_quickmap_preds()
ggarrange(
plotlist =
liverpool_plots
)
MSOA level random effects
LSOA level random effects
Conclusions
removes blockage issue of disconnected units for inexperienced users
enables interactive construction of neighbourhood structure
allows quick model comparison
can be extended to other modelling packages in future